Semi-automatic phonetic labelling of large corpora

نویسندگان

  • Odile Mella
  • Dominique Fohr
چکیده

The aim of the present paper is to present a methodology to semi-automatically label large corpora. This methodology is based on three main points: using several concurrent automatic stochastic labellers, decomposing the labelling of the whole corpus into an iterative refining process and building a labelling comparison procedure which takes into account phonologic and acousticphonetic rules to evaluate the similarity of the various labelling of one sentence. After having detailed these three points, we describe our HMM-based labelling tool and we describe the application of that methodology to the Swiss French POLYPHON database.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Title : Automatic Phonetic Transcription of Large Speech Corpora

Most large speech corpora are delivered with a lexicon that contains a canonical transcription of every word in the orthographic transcription. Such a lexicon can be used for generating a hypothetical ‘canonical’ phonetic transcription from the orthography. In addition, time and money permitting, some speech corpora are provided with a manually verified broad phonetic transcription of at least ...

متن کامل

Automatic phonetic transcription of large speech corpora: a comparative study

This study investigates whether automatic transcription procedures can approximate manual phonetic transcriptions typically delivered with contemporary large speech corpora. We used ten automatic procedures to generate a broad phonetic transcription of well-prepared speech (read-aloud texts) and spontaneous speech (telephone dialogues). The resulting transcriptions were compared to manually ver...

متن کامل

Automatic generation of phonetic transcriptions for large speech corpora

We describe a method for the automatic production of phonetic transcriptions in large speech corpora. First, we focus on the application of different techniques for the generation of pronunciation variants. Then, we explain the application of a speech recognition system for selecting the acoustically best matching phonetic transcription. The system is evaluated on different test sets selected f...

متن کامل

Semi-Automatic Annotation and Retrieval of Visual Content Using the Topic Map Technology

There are still major challenges in the area of automatic indexing and retrieval of multimedia content data for very large multimedia content corpora. Current indexing and retrieval applications still use keywords to index multimedia content and those keywords usually do not provide any knowledge about the semantic content of the data. With the increasing amount of multimedia content, it is ine...

متن کامل

The Prosogram: Semi-Automatic Transcription of Prosody Based on a Tonal Perception Model

This paper describes a system for semi-automatic transcription of prosody based on a stylization of the fundamental frequency data (contour) for vocalic (or syllabic) nuclei. The stylization is a simulation of tonal perception of human listeners. The system requires a time-aligned phonetic annotation. The transcription has been applied to several speech corpora.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997